fix device_id bug for final_state op in multiprocess testcase #41407

pangyoki · 2022-04-05T03:06:30Z

PR types

Bug fixes

PR changes

Others

Describe

问题
在分布式多进程单测test_eager_dist_api.py里，如果使用最终态op，在GetDeviceContextByBackend获取设备时报错。
原因
在分布式多进程场景下，gpu1的子进程执行时，使用GetCurrentDeviceId获取到的设备是place0，但是预期获得的应该是place1，导致DeviceContextPool没法Get到相应place。
解决方法
新动态图执行kernel前，需要使用SetDeviceId事先指定place.device。

paddle-bot-old · 2022-04-05T03:06:42Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

…cess testcase (PaddlePaddle#41407) * support final_state in multiprocess * fix no place.device * set device_id in eager_gen

…cess testcase (#41407) (#41475) Cherry-pick PR #41407

…Paddle#41407) * support final_state in multiprocess * fix no place.device * set device_id in eager_gen

* [cherry-pick2.3]fix compile bug of windows cuda11.5 (#41464) cherry-pick fix compile bug of windows cuda11.5 #41433 * fix bug of missing boost when compile cache.cc (#41449) 【chery-pick #41430】fix bug of random compile failure, due to incorrect compile order of dependencies * Fix eager try catch (#41438) (#41477) [Cherry-Pick]Fix eager try catch (#41438) * Cherry-pick-PR41407, fix device_id bug for final_state op in multiprocess testcase (#41407) (#41475) Cherry-pick PR #41407 * [BugFix] Add error hint for one_hot gpu version (#41335) (#41495) * add one_hot gpu hint * move allow_out_of_range judgement * delete useless unittest * fix bugs of reshape double grad infermeta (#41459) (#41493) * [cherrypick-2.3] modify infer gpu memory strategy (#41427), remove cudnn_deterministic=True (#41341) (#41491) Co-authored-by: JingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com> * [Cherry-pick][ROCm] fix dcu error in device event base, test=develop (#41523) Cherry-pick of #41521 * [Cherry-Pick]Cherry pick PR41200, PR41474, PR41382 (#41509) * Use `self`as a parameter of _hash_with_id function to avoid error caused by hash_id reuse (#41200) * Add fill_constant_batch_size YAML and UT (#41474) * Switch some dy2st UT to eager mode (#41382) * Sitch some dy2st UT to eager mode * Fix test_lstm and remove test_transformer * Run test_resnet_v2 in old dy mode * Unittest recover (#41431) * update name * update name * fix test * fix fleet bind * update name * update name * fix test * fix gpups wrapper * remove Push/Pull/Load/Save with context in client and wrapper base class * fix * fix * remove some interface * fix * remove * code style * recover * fix * remove code unused * remove some unused table & accessor & CommonDenseTable => MemoryDenseTable * fix * fix * fix * recover * remove unused code * recover unittest * fix * remove * fix * remove code unuseful * remove * fix * recover * remove Co-authored-by: esythan <esythan@126.com> * add ssd sparse table * fix * add cache shuffle * fix * fix * fix * fix * fix * fix * add unit test * fix Co-authored-by: Zhou Wei <1183042833@qq.com> Co-authored-by: Sing_chan <51314274+betterpig@users.noreply.github.com> Co-authored-by: 0x45f <23097963+0x45f@users.noreply.github.com> Co-authored-by: pangyoki <pangyoki@126.com> Co-authored-by: Siming Dai <908660116@qq.com> Co-authored-by: YuanRisheng <yuanrisheng@baidu.com> Co-authored-by: Zhang Jun <ewalker@live.cn> Co-authored-by: JingZhuangzhuang <75348594+JZZ-NOTE@users.noreply.github.com> Co-authored-by: Qi Li <qili93@qq.com> Co-authored-by: esythan <esythan@126.com>

support final_state in multiprocess

8be37f5

pangyoki added 2 commits April 5, 2022 08:34

fix no place.device

a0835a1

set device_id in eager_gen

3537f12

pangyoki changed the title ~~support final_state op in multiprocess testcase~~ fix device_id bug for final_state op in multiprocess testcase Apr 6, 2022

pangyoki closed this Apr 6, 2022

pangyoki reopened this Apr 6, 2022

pangyoki closed this Apr 6, 2022

pangyoki reopened this Apr 6, 2022

chenwhql approved these changes Apr 6, 2022

View reviewed changes

pangyoki merged commit b25f25d into PaddlePaddle:develop Apr 6, 2022

pangyoki mentioned this pull request Apr 6, 2022

【Cherry-pick-PR41407】fix device_id bug for final_state op in multiprocess testcase #41475

Merged

lanxianghit pushed a commit that referenced this pull request Apr 7, 2022

Cherry-pick-PR41407, fix device_id bug for final_state op in multipro…

7d143a4

…cess testcase (#41407) (#41475) Cherry-pick PR #41407

douch pushed a commit to douch/Paddle that referenced this pull request Apr 10, 2022

fix device_id bug for final_state op in multiprocess testcase (Paddle…

fb4069a

…Paddle#41407) * support final_state in multiprocess * fix no place.device * set device_id in eager_gen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix device_id bug for final_state op in multiprocess testcase #41407

fix device_id bug for final_state op in multiprocess testcase #41407

pangyoki commented Apr 5, 2022 •

edited

Loading

paddle-bot-old bot commented Apr 5, 2022

fix device_id bug for final_state op in multiprocess testcase #41407

fix device_id bug for final_state op in multiprocess testcase #41407

Conversation

pangyoki commented Apr 5, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Apr 5, 2022

pangyoki commented Apr 5, 2022 •

edited

Loading